Search CORE

Search extension transforms Wiki into a relational system: A case for flavonoid metabolite database

Author: B Mons
BA Bohm
EF Codd
J Giles
Kazuhiro Suwa
Masanori Arita
R Ierusalimschy
SL Salzberg
T Tokimatsu
Y Shinbo
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background In computer science, database systems are based on the relational model founded by Edgar Codd in 1970. On the other hand, in the area of biology the word 'database' often refers to loosely formatted, very large text files. Although such bio-databases may describe conflicts or ambiguities (e.g. a protein pair do and do not interact, or unknown parameters) in a positive sense, the flexibility of the data format sacrifices a systematic query mechanism equivalent to the widely used SQL. Results To overcome this disadvantage, we propose embeddable string-search commands on a Wiki-based system and designed a half-formatted database. As proof of principle, a database of flavonoid with 6902 molecular structures from over 1687 plant species was implemented on MediaWiki, the background system of Wikipedia. Registered users can describe any information in an arbitrary format. Structured part is subject to text-string searches to realize relational operations. The system was written in PHP language as the extension of MediaWiki. All modifications are open-source and publicly available. Conclusion This scheme benefits from both the free-formatted Wiki style and the concise and structured relational-database style. MediaWiki supports multi-user environments for document management, and the cost for database maintenance is alleviated.</p

Springer - Publisher Connector

De Novo DNA Assembly with a Genetic Algorithm Finds Accurate Genomes Even with Suboptimal Fitness

Author: A Nebro
C Ip
CS Chin
E Alba
K Bradnam
KJ Räihä
MS Poptsova
R Parsons
RJ Parsons
RL Warren
SL Salzberg
Y Cherukuri
Publication venue: Springer
Publication date: 01/04/2017
Field of study

University of Twente Research Information

Improving Phrap-Based Assembly of the Rat Using “Reliable” Overlaps

Author: AL Delcher
Aleksey V. Zimin
B Ewing
B Ewing
Brian R. Hunt
Cevat Ustun
EW Myers
GG Sutton
James R. White
James Yorke
JC Mullikin
M Roberts
Michael Roberts
Neil Hall
P Green
P Havlak
Paul Havlak
S Aparicio
S Batzoglou
S Schwartz
SL Salzberg
Wayne Hayes
X Huang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2008
Field of study

The assembly methods used for whole-genome shotgun (WGS) data have a major impact on the quality of resulting draft genomes. We present a novel algorithm to generate a set of “reliable” overlaps based on identifying repeat k-mers. To demonstrate the benefits of using reliable overlaps, we have created a version of the Phrap assembly program that uses only overlaps from a specific list. We call this version PhrapUMD. Integrating PhrapUMD and our “reliable-overlap” algorithm with the Baylor College of Medicine assembler, Atlas, we assemble the BACs from the Rattus norvegicus genome project. Starting with the same data as the Nov. 2002 Atlas assembly, we compare our results and the Atlas assembly to the 4.3 Mb of rat sequence in the 21 BACs that have been finished. Our version of the draft assembly of the 21 BACs increases the coverage of finished sequence from 93.4% to 96.3%, while simultaneously reducing the base error rate from 4.5 to 1.1 errors per 10,000 bases. There are a number of ways of assessing the relative merits of assemblies when the finished sequence is available. If one views the overall quality of an assembly as proportional to the inverse of the product of the error rate and sequence missed, then the assembly presented here is seven times better. The UMD Overlapper with options for reliable overlaps is available from the authors at http://www.genome.umd.edu. We also provide the changes to the Phrap source code enabling it to use only the reliable overlaps

CiteSeerX

Public Library of Science (PLOS)

eScholarship - University of California

Caltech Authors

Ori-Finder: A web-based system for finding oriCs in unannotated bacterial genomes

Author: A Grigoriev
A Necsulea
AC Frank
Chun-Ting Zhang
F Gao
FB Guo
Feng Gao
JP Allewalt
JR Lobry
NP Robinson
P Mackiewicz
P Worning
R Zhang
SL Salzberg
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Chromosomal replication is the central event in the bacterial cell cycle. Identification of replication origins (<it>oriC</it>s) is necessary for almost all newly sequenced bacterial genomes. Given the increasing pace of genome sequencing, the current available software for predicting <it>oriC</it>s, however, still leaves much to be desired. Therefore, the increasing availability of genome sequences calls for improved software to identify <it>oriC</it>s in newly sequenced and unannotated bacterial genomes. Results We have developed Ori-Finder, an online system for finding <it>oriC</it>s in bacterial genomes based on an integrated method comprising the analysis of base composition asymmetry using the <it>Z</it>-curve method, distribution of DnaA boxes, and the occurrence of genes frequently close to <it>oriC</it>s. The program can also deal with unannotated genome sequences by integrating the gene-finding program ZCURVE 1.02. Output of the predicted results is exported to an HTML report, which offers convenient views on the results in both graphical and tabular formats. Conclusion A web-based system to predict replication origins of bacterial genomes has been presented here. Based on this system, <it>oriC </it>regions have been predicted for the bacterial genomes available in GenBank currently. It is hoped that Ori-Finder will become a useful tool for the identification and analysis of <it>oriC</it>s in both bacterial and archaeal genomes.</p

Springer - Publisher Connector

arXiv.org e-Print Archive

Safe and complete contig assembly via omnitigs

Author: A Bankevich
A Guénoche
AR Rubinov
AS Motahari
C Kingsford
D Haussler
DR Zerbino
E Kapun
E Kapun
ES Lander
G Bresler
G Narzisi
I Lysov
JD Kececioglu
JR Miller
JT Simpson
JT Simpson
K Lam
K Sahlin
L Salmela
M Boetzer
M Boetzer
N Nagarajan
N Nagarajan
N Vyahhi
P Medvedev
P Medvedev
P Medvedev
PA Pevzner
PA Pevzner
R Chikhi
R Chikhi
R Luo
R Uricaru
RM Idury
SL Salzberg
Publication venue
Publication date: 16/08/2016
Field of study

Contig assembly is the first stage that most assemblers solve when reconstructing a genome from a set of reads. Its output consists of contigs -- a set of strings that are promised to appear in any genome that could have generated the reads. From the introduction of contigs 20 years ago, assemblers have tried to obtain longer and longer contigs, but the following question was never solved: given a genome graph

G

(e.g. a de Bruijn, or a string graph), what are all the strings that can be safely reported from

G

as contigs? In this paper we finally answer this question, and also give a polynomial time algorithm to find them. Our experiments show that these strings, which we call omnitigs, are 66% to 82% longer on average than the popular unitigs, and 29% of dbSNP locations have more neighbors in omnitigs than in unitigs.Comment: Full version of the paper in the proceedings of RECOMB 201

Public Library of Science (PLOS)

Genome Assembly Has a Major Impact on Gene Content: A Comparison of Annotation in Two Bos Taurus Assemblies

Author: Alexander Souvorov
AV Zimin
DA Wheeler
DL Wheeler
E Pennisi
IH Consortium
J Wang
JC Venter
K Eilbeck
K Liolios
L Florea
L Florea
Liliana Florea
M Clamp
M Nowrousian
M Pertea
MC Schatz
Najib M. El-Sayed
R Li
R Li
RA Gibbs
SF Altschul
SF Altschul
SL Salzberg
Steven L. Salzberg
TD Wu
Theodore S. Kalbfleisch
WJ Kent
WR Pearson
Publication venue: Public Library of Science
Publication date: 22/06/2011
Field of study

Gene and SNP annotation are among the first and most important steps in analyzing a genome. As the number of sequenced genomes continues to grow, a key question is: how does the quality of the assembled sequence affect the annotations? We compared the gene and SNP annotations for two different Bos taurus genome assemblies built from the same data but with significant improvements in the later assembly. The same annotation software was used for annotating both sequences. While some annotation differences are expected even between high-quality assemblies such as these, we found that a staggering 40% of the genes (>9,500) varied significantly between assemblies, due in part to the availability of new gene evidence but primarily to genome mis-assembly events and local sequence variations. For instance, although the later assembly is generally superior, 660 protein coding genes in the earlier assembly are entirely missing from the later genome's annotation, and approximately 3,600 (15%) of the genes have complex structural differences between the two assemblies. In addition, 12–20% of the predicted proteins in both assemblies have relatively large sequence differences when compared to their RefSeq models, and 6–15% of bovine dbSNP records are unrecoverable in the two assemblies. Our findings highlight the consequences of genome assembly quality on gene and SNP annotation and argue for continued improvements in any draft genome sequence. We also found that tracking a gene between different assemblies of the same genome is surprisingly difficult, due to the numerous changes, both small and large, that occur in some genes. As a side benefit, our analyses helped us identify many specific loci for improvement in the Bos taurus genome assembly

Automated seeding of specialised wiki knowledgebases with BioKb

Author: Ann Hedley
AR Pico
B Mons
Donald R Dunbar
F Zhou
GA Viswanathan
H Ogata
IM Sauer
John J Mullins
Jonathan R Manning
JW Huss
K Yager
R Hoffmann
R Hoffmann
S Pawlicki
SL Salzberg
TH Stokes
W Sewell
X Wang
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Springer - Publisher Connector

Edinburgh Research Explorer

Authorship Analysis Approaches

Author: A Abbasi
A Abbasi
DI Holmes
E Stamatatos
F Sebastiani
GU Yule
H Baayen
J Diederich
J Rudman
JF Burrows
JR Quinlan
LM Manevitz
M Koppel
M Koppel
N Cristianini
O De Vel
R Agrawal
R Zheng
SE Robertson
SL Salzberg
T Kucukyilmaz
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/12/2020
Field of study

This chapter presents an overview of authorship analysis from multiple standpoints. It includes historical perspective, description of stylometric features, and authorship analysis techniques and their limitations

ZU Scholars (Zayed University)